Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Issue Deduplication #11

Merged
merged 8 commits into from
Sep 16, 2024

Conversation

shiv810
Copy link
Collaborator

@shiv810 shiv810 commented Sep 13, 2024

Resolves #6

@shiv810 shiv810 changed the base branch from development to main September 13, 2024 14:30
@shiv810 shiv810 changed the base branch from main to development September 13, 2024 14:30
Copy link
Contributor

Unused types (1)

Filename types
src/adapters/supabase/helpers/issues.ts IssueType

@shiv810
Copy link
Collaborator Author

shiv810 commented Sep 13, 2024

@0x4007 I have tried to make a few examples, let me know if have more or want to test more.

Screen.Recording.2024-09-13.at.7.08.57.PM-1.mov

Copy link
Member

@0x4007 0x4007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Adding labels is out of scope. Don't do that. Close it as unplanned, don't add any labels.
  • Add a match percentage as well when any are listed.
  • How did you generate the test cases and determine their percentage similarity?

import { IssueSimilaritySearchResult } from "../adapters/supabase/helpers/issues";
import { Context } from "../types";
const MATCH_THRESHOLD = 0.95;
const WARNING_THRESHOLD = 0.5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you do 50%?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cosine similarity of 0.75 appears quite close for identifying similar issues. I tested this with a few examples and noticed some potential errors with the samples. Typically, for similar issues, the similarity was either above 75% and aligned with 95% category or around 60%. Therefore, I experimented with a 50% threshold, which seemed to work well.

src/handlers/issue-deduplication.ts Outdated Show resolved Hide resolved
src/handlers/issue-deduplication.ts Outdated Show resolved Hide resolved
@shiv810
Copy link
Collaborator Author

shiv810 commented Sep 14, 2024

  • Adding labels is out of scope. Don't do that. Close it as unplanned, don't add any labels.
  • Add a match percentage as well when any are listed.

Added They will display the cosine similarity in percentage after each issue in the list.

  • How did you generate the test cases and determine their percentage similarity?

I manually calculated and created test cases using embeddings and found their cosine similarity values.

@shiv810 shiv810 marked this pull request as ready for review September 14, 2024 17:02
@0x4007
Copy link
Member

0x4007 commented Sep 14, 2024

Can you link your issue where you tested so we can see the results?

@shiv810
Copy link
Collaborator Author

shiv810 commented Sep 14, 2024

Can you link your issue where you tested so we can see the results?

95%:

50%:

I have deployed the plugin at Plugin Link, if you wish to try it. The issues test values Link

@0x4007
Copy link
Member

0x4007 commented Sep 14, 2024

Okay it seems like you aren't following the spec again.

Needs to list the similar results on every scenario.

Do 75% and 95% as a default.

@shiv810
Copy link
Collaborator Author

shiv810 commented Sep 14, 2024

Okay it seems like you aren't following the spec again.

Needs to list the similar results on every scenario.

Fixed that, it now returns the similar issue in both MATCH and WARNING case.

Do 75% and 95% as a default.

Warning Threshold is 75% now.

95%:

75%:

@0x4007
Copy link
Member

0x4007 commented Sep 14, 2024

Fixed that, it now returns the similar issue in both MATCH

Doesn't look like it in the first one

@shiv810
Copy link
Collaborator Author

shiv810 commented Sep 14, 2024

  • First Comment

That's the first issue of that type, so its expected to not have similar issues. Two, should be the first time a similar issue, is found with similarity more than 95%.

So, the first issue would not satisfy the any of the match conditions. The third issue does not have any similar issues to that, so it wouldn't have any message.

Copy link
Member

@0x4007 0x4007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool just needs configuration and I can merge.

src/handlers/issue-deduplication.ts Outdated Show resolved Hide resolved
@shiv810 shiv810 requested a review from 0x4007 September 16, 2024 10:57
Copy link
Member

@0x4007 0x4007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming it all works. Code looks good.

@0x4007 0x4007 merged commit 1208f25 into ubiquity-os-marketplace:development Sep 16, 2024
2 checks passed
@ubiquity-os ubiquity-os bot mentioned this pull request Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue Dedupe
3 participants